Team-Maxmin Equilibrium: Efficiency Bounds and Algorithms

نویسندگان

  • Nicola Basilico
  • Andrea Celli
  • Giuseppe De Nittis
  • Nicola Gatti
چکیده

The Team-maxmin equilibrium prescribes the optimal strategies for a team of rational players sharing the same goal and without the capability of correlating their strategies in strategic games against an adversary. This solution concept can capture situations in which an agent controls multiple resources—corresponding to the team members—that cannot communicate. It is known that such equilibrium always exists and it is unique (unless degeneracy) and these properties make it a credible solution concept to be used in real–world applications, especially in security scenarios. Nevertheless, to the best of our knowledge, the Team–maxmin equilibrium is almost completely unexplored in the literature. In this paper, we investigate bounds of (in)efficiency of the Team– maxmin equilibrium w.r.t. the Nash equilibria and w.r.t. the Maxmin equilibrium when the team members can play correlated strategies. Furthermore, we study a number of algorithms to find and/or approximate an equilibrium, discussing their theoretical guarantees and evaluating their performance by using a standard testbed of game instances. Introduction The computational study of game–theoretic solutions concepts is among the most important challenges addressed in the last decade of Computer Science (Deng, Papadimitriou, and Safra 2002). These problems acquired particular relevance in Artificial Intelligence, where the goal is to design physical or software agents that must behave optimally in strategic situations. In addition to the well– known Nash equilibrium (Nash 1951), other solution concepts received attention in the Artificial Intelligence literature thanks to their application in security domains. Examples include Maxmin equilibrium for zero–sum games under various forms of constraints over the actions of the players (Jain et al. 2010) and Stackelberg (a.k.a. leader–follower) equilibrium (Conitzer and Sandholm 2006). While a large part of the literature focuses on 2–player games, few results are known about games with more players—except for games with a very specific structure, e.g., congestion games (Nisan et al. 2007). In this paper, we focus on the Team–maxmin equilibrium proposed by (von Copyright c © 2017, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. Stengel and Koller 1997). It applies to zero–sum games between a team and an adversary. The team is defined as a set of players with the same utility function UT and without the capability of synchronizing their actions. The adversary is a single player with utility function ́UT . These games can model many realistic security scenarios, for example those where multiple non–coordinating agents share the common objective of defending an environment against a malicious attacker. In (Jiang et al. 2013), a security setting of such type is studied and an analysis of the price of mis– coordination in the specific proposed security games is conducted. The Team–maxmin equilibrium plays a crucial role also in infinitely repeated games and role assignment problems (Moon and Conitzer 2016), where it is necessary to compute threat points. The current approach to tackle these problems, in games with more than two players, is considering the correlated threat point (Kontogiannis and Spirakis 2008) or employing approximating algorithms that avoid the use of linear programming (Andersen and Conitzer 2013). Our techniques allow the computation punishment strategies (leading to the threat points) in the general scenario in which players, other than the defector, cannot coordinate strategy execution. The study of Team–maxmin equilibrium is almost completely unexplored. It is known that it always exists, it is unique except for degeneracies, and it is the best Nash equilibrium for the team, but, to the best of our knowledge, only two computational works deal with this solution concept. (Borgs et al. 2010) show that the Minmax value (equivalently the Team–maxmin value) is inapproximable in additive sense within 3 m2 even in 3–player games with m actions per player and binary payoffs (but nothing is known about the membership to APX class or some super class); (Hansen et al. 2008) strengthen the previous complexity result and provide a quasi–polynomial time –approximation (in additive sense) algorithm. Only (Lim 1997; Alpern and Lim 1998) deal with the mathematical derivation for a specific class of games with an adversary, i.e., rendezvous– evasion games. Instead, a number of works deal with team games without adversary. We just cite a few for the sake of completeness. Team games were first proposed in (Palfrey and Rosenthal 1983) as voting games, then studied in repeated and absorbing games to understand the interaction among the players (Bornstein, Erev, and Goren 1994; ar X iv :1 61 1. 06 13 4v 1 [ cs .A I] 1 8 N ov 2 01 6 Bornstein, Winter, and Goren 1996; Bornstein, Budescu, and Zamir 1997; Solan 2000) and more recently in Markov games with noisy payoffs (Wang and Sandholm 2002). Original contributions We provide two main contributions. First, we study the relationship, in terms of efficiency for the team, between Nash equilibrium (i.e., when players are not teammates), Team–maxmin equilibrium, and Correlated–team maxmin equilibrium (i.e., the Maxmin equilibrium when all the team members can play in correlated strategies and then can synchronize the execution of their actions). We show that, even in the same instances with binary payoffs, the worst Nash equilibrium may be arbitrarily worse than the Team–maxmin equilibrium that, in its turn, may be arbitrarily worse (in this case only asymptotically) than the Correlated–team maxmin equilibrium. We provide exact bounds for the inefficiency and we design an algorithm that, given a correlated strategy of the team, returns in polynomial time a mixed strategy of the team minimizing the worst–case ratio between the utility given by the correlated strategy and the utility given by the mixed strategy. Second, we provide some algorithms to find and/or approximate the Team–maxmin equilibrium, we discuss their theoretical guarantees and evaluate them in practice by means of a standard testbed (Nudelman et al. 2004). We also identify the limits of such algorithms and discuss which ones are the best to be adopted depending on the instance to be solved. For the sake of presentation, the proofs of the theorems are presented in the Appendices. Preliminaries A normal–form game is a tuple pN,A,Uq where: N “ t1, 2, . . . , nu is the set of players; A “ Ś iPN Ai is the set of player i’s actions, where Ai “ ta1, a2, . . . , amiu; U “ tU1, U2, . . . , Unu is the utility function of player i, where Ui : A Ñ R. A strategy profile is defined as s “ ps1, s2, . . . , snq, where si P ∆pAiq is player i’s mixed strategy and ∆pAiq is the set of all the probability distributions over Ai. As customary, ́i denotes the set containing all the players except player i. We study games in which the set of players T “ t1, 2, . . . , n ́ 1u constitutes a team whose members have the same utility function UT . Player n is an adversary of the team and her utility function is ́UT . When the teammates cannot coordinate at all and therefore no player can communicate with the others, and each player takes decisions independently, the appropriate solution concept is the Nash equilibrium, which prescribes a strategy profile where each player i’s strategy si is a best response to s ́i. In 2–player zero–sum games, a Nash equilibrium is a pair of Maxmin/Minmax strategies and can be computed in polynomial time. In arbitrary games, the computation of a Nash equilibrium is PPAD–complete even when the number of players is fixed (Daskalakis, Goldberg, and Papadimitriou 2009). Instead, when the teammates can coordinate themselves, we distinguish two forms of coordinations: correlated, in which a correlating device decides a joint action (i.e., an action profile specifying one action per teammate) and then communicates each teammate her action, and non–correlated, in which each player plays independently from the others. When the coordination is non–correlated, players are subject to the inability of correlating their actions, and their strategy si is mixed, as defined above for a generic normal– form game. In other words, teammates can jointly decide their strategies, but they cannot synchronize their actions, which must then be drawn independently. The appropriate solution concept for such setting is the Team–maxmin equilibrium. A Team–maxmin equilibrium is a Nash equilibrium with the properties of being unique (except for degeneracies) and the best one for the team. These property are very appealing in real–world settings, since they allow to avoid the equilibrium selection problem which affects the Nash equilibrium. In security applications, for instance, the equilibrium uniqueness allows to perfectly forecast the behavior of the attacker (adversary). When the number of players is given, finding a Team–maxmin equilibrium is FNP– hard and the Team–maxmin value is inapproximable in additive sense even when the payoffs are binary (Hansen et al. 2008). 1 In (Hansen et al. 2008), the authors provide a quasi–polynomial–time –approximation (in additive sense) algorithm. Furthermore, a Team–maxmin equilibrium may contain irrational probabilities even with 2 teammates and 3 different values of payoffs.2 It is not known any experimental evaluation of algorithms for finding the Team–maxmin equilibrium. When players can synchronize their actions, the team strategy is said to be correlated. Given the set of team action profiles defined as AT “ Ś iPT Ai, a correlated team strategy is defined as p P ∆pAT q. In other words, teammates can jointly decide and execute their strategy. The team is then equivalent to a single player whose actions are joint team action profiles. In such case, the appropriate solution concept for the team and the adversary is a pair of Maxmin/Minmax strategies that, for the sake of clarity, we call in this paper Correlated–team maxmin equilibrium. This equilibrium can be found by means of linear programming since it can be formulated as a maxmin problem in which the max player’s action space is given by the Cartesian product of the action space of each teammate. Notice that the size of the input is exponential in the number of teammates and therefore approximation algorithms for games with many team members are necessary in practice. Furthermore, it is not known the price—in terms of inefficiency—paid by a team due to the inability of synchronizing the execution of their actions. This would allow to understand how the Team–maxmin equilibrium is inefficient w.r.t. the Correlated–team maxmin equilibrium, or equivalently, how well the Team-maxmin equilibrium approximates the Correlated-team maxmin equilibrium. Another open problem is studying the gain a set of players sharRigorously speaking, (Hansen et al. 2008) studies Minmax strategy when there is a single max player and multiple min players. The problem of finding the Team–maxmin equilibrium in zero– sum adversarial team games can be formulated as the problem of finding such Minmax strategy and vice versa. The proof, provided in (Hansen et al. 2008), contains a minor flaw. In the Appendices, we provide a correct revision of the proof with all the calculations, omitted in the original proof. ing the same goal would have in forming a team and coordinating their mixed strategies (i.e., how is the Nash equilibrium inefficient w.r.t. the Team-maxmin equilibrium, or equivalently, how well the Nash equilibrium approximates the Team-maxmin equilibrium). Nash, Team-maxmin, and Correlated–team maxmin equilibria We study the relationships between Nash equilibrium and Team–maxmin equilibrium in terms of efficiency for the team. In our analysis, we resort to the concept of Price of Anarchy (POA), showing that Nash equilibrium—precisely, the worst case Nash equilibrium—may be arbitrarily inefficient w.r.t. the Team–maxmin equilibrium—corresponding to the best Nash equilibrium for the team. In this case the POA provides a measure about the inefficiency that a group of players with the same goal would have if they do not form a team. To have coherent results with the definition of POA, we consider games with payoffs in the range r0, 1s. We observe that our results will hold without loss of generality since, given any arbitrary game, we can produce an equivalent game in which the payoffs are in such a range by using an affine transformation. Furthermore, for the sake of presentation, we consider only games in which m1 “ . . . “ mn “ m. The generalization of our results when players may have a different number of actions is straightforward. Theorem 1 The Price of Anarchy (POA) of the Nash equilibrium w.r.t. the Team–maxmin equilibrium may be POA“ 8 even in games with 3 players (2 teammates), 2 actions per player, and binary payoffs. In order to evaluate the inefficiency of the Team–maxmin equilibrium w.r.t. the Correlated–team maxmin equilibrium, we introduce a new index similar to the mediation value proposed in (Ashlagi, Monderer, and Tennenholtz 2008) and following the same rationale of the POA. We call such an index Price of Uncorrelation (POU) and we define it as the ratio between the team’s utility provided by the Correlated– team maxmin equilibrium and that one provided by the Team–maxmin equilibrium. POU provides a measure of the inefficiency due to the impossibility, for the teammates, of synchronizing the execution of their strategies. Definition 1 Let us consider an n–player game. The Price of Uncorrelation (POU) is defined as POU “ v team C vteam M ě 1 where vteam C is Correlated–team maxmin value of the team and vteam M is Team–maxmin value of the team. We initially provide a lower bound over the worst–case POU. Theorem 2 The POU of the Team–maxmin equilibrium w.r.t. the Correlated–team maxmin equilibrium may be POU“ mn ́2 even in games with binary payoffs. Now, we provide an upper bound over the worst–case POU. Theorem 3 Given any n–player game and a Correlated– team maxmin equilibrium with a utility of v for the team, it is always possible to find in polynomial time a mixed strategy profile for the team providing a utility of at least v mn ́2 to the team and therefore POU is never larger than mn ́2. We observe that Theorem 7 shows that the upper bound of POU is at least mn ́2, while Theorem 8 shows that POU cannot be larger than mn ́2. Therefore, POU is arbitrarily large only asymptotically. In other words, POU “ 8 only when m or n go to 8.3 More importantly, the proof of Theorem 8 provides a polynomial–time algorithm to find a mixed strategy of the team given a correlated strategy and this algorithm is the best possible algorithm in terms of worst–case minimization of POU. The algorithm is simple and computes mixed strategies for the team members as follows. Given the Correlated–team maxmin equilibrium p P ∆pA1ˆ . . .ˆAn ́1q, the mixed strategy of player 1 ps1q is such that each action a1 is played with the probability that a1 is chosen in p, that is s1pa1q “ ř a ́1PA ́1 ppa1, a ́1q. Every other team member i P Nzt1, nu plays uniformly over the actions she plays with strictly positive probability in p. Since the computation of a Correlated–team maxmin equilibrium can be done in polynomial time, such an algorithm is a polynomial–time approximation algorithm for the Team–maxmin equilibrium. Furthermore, notice that POU rises polynomially in the number of actions m and exponentially in the number of players n. Interestingly, the instances used in the proof of Theorem 7 generalize the instances used in the proof of Theorem 6. Indeed, it can be observed that the POA of the Nash equilibrium w.r.t. the Team–maxmin equilibrium is8 in the instances used in the proof of Theorem 7. Therefore, there are instances in which the worst Nash equilibrium is arbitrarily worse than the Team–maxmin equilibrium and, in its turn, the Team–maxmin equilibrium is arbitrarily worse (in this case only asymptotically) than the Correlated–team maxmin equilibrium. For the sake of completeness, we state the following result, showing the lower bound of POU. Theorem 4 The POU of the Team–maxmin equilibrium w.r.t. the Correlated–team maxmin equilibrium may be POU“ 1 even in games with binary payoffs. Algorithms to find and/or approximate a Team-maxmin equilibrium In the following, we describe four algorithms to find/approximate the Team–maxmin equilibrium. Global optimization The problem of finding the Team– maxmin equilibrium can be formulated as a non–linear non– convex mathematical program as follows: A more accurate bound can be obtained by substituting m with the size of the equilibrium support, showing that the inefficiency increases as the equilibrium support increases. Algorithm 1 SupportEnumeration 1: v ̊ “ `8 2: for all i P T do 3: Pi “ tpV 1 i ,m1i q, pV 2 i ,m2i q, . . . | @j, V j i Ď Ai, ř

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Computing the Team-maxmin Equilibrium in Single-Team Single-Adversary Team Games

A team game is a non–cooperative normal–form game in which some teams of players play against others. Team members share a common goal but, due to some constraints, they cannot act jointly. A real–world example is the protection of environments or complex infrastructures by different security agencies: they all protect the area with valuable targets but they have to act individually since they ...

متن کامل

Team-Maxmin Equilibria

In a noncooperative game, a team is a set of players that have identical payoffs. We investigate zero-sum games where a team of several players plays against a single adversary. The team is not regarded as a single player because the team members might not be able to coordinate their actions. In such a game, a certain equilibrium can be selected naturally: the team-maxmin equilibrium. It assure...

متن کامل

Combining Incremental Strategy Generation and Branch and Bound Search for Computing Maxmin Strategies in Imperfect Recall Games

Extensive-form games with imperfect recall are an important model of dynamic games where the players are allowed to forget previously known information. Often, imperfect recall games are the result of an abstraction algorithm that simplifies a large game with perfect recall. Unfortunately, solving an imperfect recall game has fundamental problems since a Nash equilibrium does not have to exist....

متن کامل

Nonantagonistic noisy duels of discrete type with an arbitrary number of actions

We study a nonzero-sum game of two players which is a generalization of the antagonistic noisy duel of discrete type. The game is considered from the point of view of various criterions of optimality. We prove existence of ε-equilibrium situations and show that the ε-equilibrium strategies that we have found are ε-maxmin. Conditions under which the equilibrium plays are Pareto-optimal are given.

متن کامل

Planning Robot Motion in a 2-D Region with Unknown Obstacles

The purpose of this paper is to present several algorithms for planning the motion of a robot in a two-dimensional region having obstacles whose shapes and locations are unknown. The convergence and efficiency of the algorithms are discussed and upper bounds for the lengths of paths generated by the different algorithms are derived and compared.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017